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Abstract The author's presentation of multilevel Monte Carlo path simulation at the 
MCQMC 2006 conference stimulated a lot of research into multilevel Monte Carlo 
methods. This paper reviews the progress since then, emphasising the simplicity, 
flexibility and generality of the multilevel Monte Carlo approach. It also offers a 
few original ideas and suggests areas for future research. 



1 Introduction 

1.1 Control variates and two-level MLMC 
(N 

C*|* , One of the classic approaches to Monte Carlo variance reduction is through the use 

of a control variate. Suppose we wish to estimate E[f], and there is a control variate 
g which is well correlated to / and has a known expectation K[g]. In that case, we 
can use the following unbiased estimator for E[/]: 



AT' £{/(«)_ A (>)_ Ek] )}. 



/\ . The optimal value for X is p \fV\f] /V[g], where p is the correlation between / 

2 | and g, and the variance of the control variate estimator is reduced by factor 1 — p 2 

compared to the standard estimator. 

A two-level version of MLMC (multilevel Monte Carlo) is very similar. If we 
want to estimate K[P\] but it is much cheaper to simulate Pq bs Pi, then since 

E[P 1 ]=E[P ]+E{P l -P \ 
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we can use the unbiased two-level estimator 

N l Z P o n) + N ^L( P i ,] - p o" : 

n=\ n=\ 

Here P\ — Pi represents the difference between Pi and Po for the same underlying 

stochastic sample, so that P; ' —Pq is small and has a small variance; the precise 
construction depends on the application and various examples will be shown later. 
The two key differences from the control variate approach are that the value of E[Po] 
is not known, so has to be estimated, and we use A = 1 . 

If we define C§ and C\ to be the cost of computing a single sample of Pq and 
Pi— Po, respectively, then the total cost is No Q) +N\ C\ , and if Vq and V\ are the 
variance of Po and Pi — Po, then the overall variance is NZ Vq +N^ Vi, assuming 

No N\ 

that y\ Pq' and V (P} —Po ) use independent samples. 

n=\ n=\ 

Hence, treating the integers Nq,N\ as real variables and performing a constrained 
minimisation using a Lagrange multiplier, the variance is minimised for a fixed cost 
by choosing N l /N = ^hfc[ / ^V Q /C . 



1.2 Multilevel Monte Carlo 

The full multilevel generalisation is quite natural: given a sequence Po,Pi , . . . , which 
approximates Pi with increasing accuracy, but also increasing cost, we have the 
simple identity 

L 

E[P L ]=E[Po] + £E[P £ <-P^i], 
l=\ 

and therefore we can use the following unbiased estimator for E[Pl], 

No , , L ( N t 



V 



I^' + lklff'-* 1 



H=l {=1 { 11=1 

with the inclusion of the level £ in the superscript (t,n) indicating that the samples 
used at each level of correction are independent. 

If we define Co, Vo to be the cost and variance of one sample of Po, and Q, Vg to be 
the cost and variance of one sample of Pi— Pt-\, then the overall cost and variance 

L L 

of the multilevel estimator is V N( <Q and T^A^r 1 Vi, respectively. 

e=o e=o 

For a fixed cost, the variance is minimised by choosing Nt = A y/V( / Q for some 
value of the Lagrange multiplier A. In particular, to achieve an overall variance of 
e 2 requires that A = £~ 2 Lf=o \fVe~Ct- The total computational cost is then 
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L 
1 

a ; =o 



c = £- 2 £VWQ • (i) 



It is important to note whether the product V( Q increases or decreases with t, 
i.e. whether or not the cost increases with level faster than the variance decreases. 
If it increases with level, so that the dominant contribution to the cost comes from 
VlCl then we have C s=s E~ 2 VlCl, whereas if it decreases and the dominant con- 
tribution comes from VqCq then C « e~ 2 VoCq. This contrasts to the standard MC 
cost of approximately £~ 2 Vb Q,, assuming that the cost of computing Pi is similar 
to the cost of computing Pl—Pl-\, and that Y[Pl] ~ V[Po]- This shows that in the 
first case the MLMC cost is reduced by factor Vl/Vq, corresponding to the ratio of 
the variances Y[Pl~ Pl-i] an d V[J°l], whereas in the second case it is reduced by 
factor Cq/Cl, the ratio of the costs of computing Pq and Pl—Pl-i- If the product 
Vf Q does not vary with level, then the total cost is £~ 2 L 2 VqCq = £~ 2 L 2 VlCl- 



1.3 Earlier related work 

Prior to the author's first publications [20. 21 1 on MLMC for Brownian path sim- 
ulations, Heinrich developed a multilevel Monte Carlo method for parametric inte- 
gration, the evaluation of functionals arising from the solution of integral equations, 
and weakly singular integral operators ll33l 134] l35l l36l l37l . Parametric integration 
concerns the estimation of E[/(x, A)] where x is a finite-dimensional random vari- 
able and A is a parameter. In the simplest case in which A is a real variable in the 
range [0,1], having estimated the value of E[/(jc,0)] and E[/(*, 1)], one can use 
j(f(x,0) +f(x,l)) as a control variate when estimating the value of E[/(x, i)]. 
This approach can then be applied recursively for other intermediate values of A, 
yielding large savings if f(x,X) is sufficiently smooth with respect to A. Although 
this does not quite fit into the general MLMC form given in the previous section, 
the recursive control variate approach is very similar and the complexity analysis is 
also very similar to the analysis to be presented in the next section. 

Although not so clearly related, there are papers by Brandt et al (9] [10) which 
combine Monte Carlo techniques with multigrid ideas in determining thermody- 
namic limits in statistical physics applications. It is the multigrid ideas of Brandt 
and others for the iterative solution of systems of equations which were the inspira- 
tion for the author in developing the MLMC method for SDE path simulation. 

In 2005, Kebaier f4TTl developed a two-level approach for path simulation which 
is very similar to the author's approach presented in the next section. The only dif- 
ferences are the use of only two levels, and the use of a general multiplicative factor 
as in the standard control variate approach. A similar multilevel approach was under 
development at the same time by Speight, but was not published until later | 



Michael B. Giles 



2 MLMC theorem 



In the Introduction, we considered the case of a general multilevel method in which 
the output Pl on the finest level corresponds to the quantity of interest. However, in 
many infinite-dimensional applications, such as in SDEs and SPDEs, the output P( 
on level t is an approximation to a random variable P. In this case, the mean square 
error (MSE) has the usual decomposition into the total variance of the multilevel 
estimator, plus the square of the bias (E{Pl— P]) 2 - To achieve an MSE which is less 
than e 2 , it is sufficient to ensure that each of these terms is less than ie 2 . This leads 
to the following theorem: 

Theorem 1. Let P denote a random variable, and let P( denote the corresponding 
level £ numerical approximation. 

If there exist independent estimators Yi based on N( Monte Carlo samples, and 
positive constants a,/3,y,ci,C2,C3 such that (X>\ min(/3, y) and 

i) \E[P e -P}\ < c\2- al 

ii) E[Y e ] = { 

[Ept-Pt-i], £>Q 

Hi) ¥[Y £ ] < c 2 Ni l 2-P i 

iv) E[Q] < c^N(2^ , where C( is the computational complexity of Y( 

then there exists a positive constant c 4 such that for any £<e there are values L 
and N( for which the multilevel estimator 



Y=Z,Y e , 

has a mean-square-error with bound 

MSE = e\(Y-E[P}) 2 ] <e 2 



with a computational complexity C with bound 



c 4 £~ 



c 4 e 



-2-(y-p)/a 



J3>y, 



E[C}<{ c 4 £- 2 (log£) 2 , J3 = y, 



, J3<7- 



The statement of the theorem is a slight generalisation of the original theorem in 
ETI . It corresponds to the theorem and proof in |fj"5l , except for the minor change to 
expected costs to allow for applications such as jump-diffusion modelling in which 
the simulation cost of individual samples is itself random. 

The theorem is based on the idea of a geometric progression in the levels of ap- 
proximation, leading to the exponential decay in the weak error in condition i), and 
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the variance in condition Hi), as well as the exponential increase in the expected cost 
in condition iv). This geometric progression was based on experience with multigrid 
methods in the iterative solution of large systems of linear equations, but it is worth 
noting that it is not necessarily the optimal choice in all circumstances. 

The result of the theorem merits some discussion. In the case /3 > 7, the dominant 
computational cost is on the coarsest levels where Q = (9(1) and (9(e~ 2 ) samples 
are required to achieve the desired accuracy. This is the standard result for a Monte 
Carlo approach using i.i.d. samples; to do better would require an alternative ap- 
proach such as the use of Latin hypercube sampling or quasi-Monte Carlo methods. 
In the case j3 < 7, the dominant computational cost is on the finest levels. Because 
of condition i), 2~ aL = 0(e), and hence Cl = 0(e~ y l a ). If j3 = 2a, which is usually 
the largest possible value for a given a, for reasons explained below, then the total 
cost is 0(Cl) corresponding to 0(1) samples on the finest level, again the best that 
can be achieved. The dividing case /3 = 7 is the one for which both the computa- 
tional effort, and the contributions to the overall variance, are spread approximately 
evenly across all of the levels; the (loge) 2 term corresponds to the L 2 factor in the 
corresponding discussion in section [L2l 

The natural choice for the multilevel estimator is 

Yt=Ni i Y,Pe(co,)-P ( -i(co,), (2) 

i 

where P((coi) is the approximation to P(cOi) on level £, and P^_i(o,) is the corre- 
sponding approximation on level i—\ for the same underlying stochastic sample 
(0{. Note that V[P£— Pt~i] is usually similar in magnitude to E[(P(<— Pf_i) 2 ] which is 
greater than (E[P^— i^_i]) 2 ; this implies that j3 < 2a and hence the condition in the 
theorem that a > \ min(/3, 7) is satisfied. 

However, the multilevel theorem allows for the use of other estimators, provided 
they satisfy the restriction of condition ii) which ensures that E[T] = E[P/J. Two 
examples of this will be given later in the paper. In the first, slightly different nu- 
merical approximations are used for the coarse and fine paths in SDE simulations, 
giving 

i 

Provided E[P^'] = E[P|] so that the expectation on level £ is the same for the two 
approximations, then condition ii) is satisfied and no additional bias (other than the 
bias due to the approximation on the finest level) is introduced into the multilevel 
estimator. The second example defines an antithetic cof with the same distribution 
as a>i, and then uses the multilevel estimator 

Yi>=Nf l £ \(Pt{eH)+P t (ai?))-Pt-i(aH). 

i 

Since E[P^(fi)?)] = E[i^ (©,-)], then again condition ii) is satisfied. In each case, the 
objective in constructing a more complex estimator is to achieve a greatly reduced 
variance Y[Y(] so that fewer samples are required. 
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3 SDEs 



3.1 Euler discretisation 



The original multilevel path simulation paper 11211 treated SDEs using the simple 
Euler-Mamyama discretisation together with the natural multilevel estimator (0. 

Provided the SDE satisfies the usual conditions (see Theorem 10.2.2 in 11421 ). the 
strong error for the Euler discretisation with timestep h is 0(h}' 2 ), and therefore 
for Lipschitz payoff functions P (such as European, Asian and lookback options in 
finance) the variance Vi = V[P£—Pi—i] is 0(h£). If hi = 4~^/zq, as in J2T1 , then this 
gives a=2, j3 =4 and 7=2. Alternatively, if he = 2~ e ho, then a=l, J3 =2 and 7=1. 
In either case, Theorem[TJgives the complexity to achieve a root-mean-square error 
of e to be <9(e~ 2 (log e) 2 ), which is near-optimal as Muller-Gronbach & Ritter have 
proved an 0(e~ 2 ) lower bound for the complexity 11461 . 

For other payoff functions the complexity is higher. V( m 0(h 1 ' 2 ) for the digital 
option which is a discontinuous function of the SDE solution at the final time, and 
the barrier option which depends discontinuously on the minimum or maximum 
value overthe full time interval. Loosely speaking, this is because there is an 0(h 1 ' 2 ) 
probability of the coarse and fine paths being on opposite sides of the discontinuity, 
and in such cases there is an 0(1) difference in the payoff. Currently, there is no 
known "fix" for this for the Euler-Maruyama discretisation; we will return to this 
issue for the Milstein discretisation when there are ways of improving the situation. 

Table [TJ summarises the observed variance convergence rate in numerical exper- 
iments for the different options, and the theoretical results which have been ob- 
tained; the digital option analysis is due to Avikainen [4| while the others are due 
to Giles, Higham & Mao (24l . Although the analysis in some of these cases is for 
one-dimensional SDEs, it also applies to multi-dimensional SDEs 





Euler-Maruyama 


Milstein 


option 


numerics 


analysis 


numerics 


analysis 


Lipschitz 


0(h) 


0(h) 


oik 1 ) 


0(h 2 ) 


Asian 


0(h) 


0(h) 


0(h 2 ) 


0(h 2 ) 


lookback 


0(h) 


0(h) 


0(h 2 ) 


o(h 2 - s ) 


barrier 


0(h x l 2 ) 


o(h 1 ' 2 - 5 ) 


0(hV 2 ) 


o^l 2 - 5 ) 


digital 


0(h l l 2 ) 


0(h x l 2 \ogh) 


0(/j 3 / 2 ) 


o(hV 2 - 3 ) 



Table 1 Observed and theoretical convergence rates for the multilevel correction variance for 
scalar SDEs, using the Euler-Maruyama and Milstein discretisations. 5 is any strictly positive 
constant. 
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3.2 Milstein discretisation 

For Lipschitz payoffs, the variance V( for the natural multilevel estimator converges 
at twice the order of the strong convergence of the numerical approximation of 
the SDE. This immediately suggests that it would be better to replace the Euler- 
Maruyama discretisation by the Milstein discretisation J20l since it gives first order 
strong convergence under certain conditions (see Theorem 10.3.5 in l42l ). 

This immediately gives an improved variance for European and Asian options, as 
shown in Table Q] but to get the improved variance for lookback, barrier and digital 
options requires the construction of estimators which are slightly different on the 
coarse and fine path simulations, but which respect the condition that E[K] = E[Pf] . 

The construction for the digital option will be discussed next, but for the lookback 
and barrier options, the key is the definition of a Brownian Bridge interpolant based 
on the approximation that the drift and volatility do not vary within the timestep. 
For each coarse timestep, the mid-point of the interpolant can be sampled using 
knowledge of the fine path Brownian increments, and then classical results can be 
used for the distribution of the minimum or maximum within each fine timestep for 
both the fine and coarse path approximations [29 1. The full details are given in [20], 
and Table Q] summarises the convergence behaviour observed numerically, and the 
supporting numerical analysis by Giles, Debrabant & RoBler 11231 . 

The outcome is that for the case in which the number of timesteps doubles at each 
level, so h£ — 2~ t liQ, then y= 1 and either /3 =2 (European, Asian and lookback) 
or j3 = 1.5 (barrier and digital). Hence, we are in the regime where /3 > y and the 
overall complexity is (9(e~ 2 ). Furthermore, the dominant computational cost is on 
the coarsest levels of simulation. 

Since the coarsest levels are low-dimensional, they are well suited to the use of 
quasi-Monte Carlo methods which are particularly effective in lower dimensions 
because of the existence of 0([\ogN) d /N) error bounds, where d is the dimension 
and TV is the number of QMC points. The bounds are for the numerical integration 
of certain function classes on the unit hypercube, and are a consequence of the 
Koksma-Hlawka inequality together with bounds on the star-discrepancy of certain 
sequences of QMC points. 

This has been investigated by Giles & Waterhouse ||281 using a rank-1 lattice 
rule to generate the quasi-random numbers, randomisation with 32 independent off- 
sets to obtain confidence intervals, and a standard Brownian Bridge construction of 
the increments of the driving Brownian process. The numerical results show that 
MLMC on its own was better than QMC on its own, but the combination of the two 
was even better. The QMC treatment greatly reduced the variance per sample for the 
coarsest levels, resulting in significantly reduced costs overall. In the simplest case 
of a Lipschitz European payoff, the computational complexity was reduced from 
<9(e~ 2 ) to approximately 0(s~ is ). 
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3.2.1 Digital options 

As discussed earlier, discontinuous payoffs pose a challenge to the multilevel Monte 
Carlo approach, because small differences in the coarse and fine path simulations 
can lead to an (9(1) difference in the payoff function. This leads to a slower decay 
in the variance Vg, and because the fourth moment is also much larger it leads to 
more samples being required to obtain an accurate estimate for Vg, which is needed 
to determine the optimal number of samples Ng. 

This is a generic problem. Although we will discuss it here in the specific context 
of a Brownian SDE and an option which is a discontinuous function of the under- 
lying at the final time, the methods which are discussed are equally applicable in a 
range of other cases. Indeed, some of these techniques have been first explored in 
the context of pathwise sensitivity analysis fl2l or jump-diffusion modelling |52l . 

Conditional expectation 

The conditional expectation approach builds on a well-established technique for 
payoff smoothing which is used for pathwise sensitivity analysis (see, for example, 
pp. 399-400 in H29l ). 

We start by considering the fine path simulation, and make a slight change by 
using the Euler-Maruyama discretisation for the final timestep, instead of the Mil- 
stein discretisation. Conditional on the numerical approximation of the value Sj-h 
one timestep before the end (which in turn depends on all of the Brownian incre- 
ments up to that time) the numerical approximation for the final value St now has a 
Gaussian distribution, and for a simple digital option the conditional expectation is 
known analytically. 

The same treatment is used for the coarse path, except that in the final timestep, 
we re-use the known value of the Brownian increment for the second last fine 
timestep, which corresponds to the first half of the final coarse timestep. This results 
in the conditional distribution for the coarse path underlying at maturity matching 
that of the fine path to within 0(h), for both the mean and the standard deviation 
l23l . Consequently, the difference in payoff between the coarse and fine paths near 
the payoff discontinuity is 0(h 1 ' 2 ), and so the variance is approximately <9(/j 3 ' 2 ). 

Splitting 

The conditional expectation technique works well in ID where there is a known 
analytic value for the conditional expectation, but in multiple dimensions it may not 
be known. In this case, one can use the technique of "splitting" Q. Here the condi- 
tional expectation is replaced by a numerical estimate, averaging over a number of 
sub-samples, i.e. for each set of Brownian increments up to one fine timestep before 
the end, one uses a number of samples of the final Brownian increment to produce 
an average payoff. If the number of sub-samples is chosen appropriately, the vari- 
ance is the same, to leading order, without any increase in the computational cost, 
again to leading order. Because of its simplicity and generality, this is now my pre- 
ferred approach. Furthermore, one can revert to using the Milstein approximation 
for the final timestep. 
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Change of measure 

The change of measure approach is another approximation to the conditional 
expectation. The fine and coarse path conditional distributions at maturity are two 
very similar Gaussian distributions. Instead of following the splitting approach of 
taking corresponding samples from these two distributions, we can instead take a 
sample from a third Gaussian distribution (with a mean and variance perhaps equal 
to the average of the other two). This leads to the introduction of a Radon-Nikodym 
derivative for each path, and the difference in the payoffs from the two paths is then 
due to the difference in their Radon-Nikodym derivatives. 

In the specific context of digital options, this is a more complicated method to 
implement, and the resulting variance is no better. However, in other contexts a 
similar approach can be very effective. 



3.2.2 Multi-dimensional SDEs 

The discussion so far has been for scalar SDEs, but the computational benefits of 
Monte Carlo methods arise in higher dimensions. For multi-dimensional SDEs sat- 
isfying the usual commutativity condition (see, for example, p. 353 in ll29l ) the Mil- 
stein discretisation requires only Brownian increments for its implementation, and 
most of the analysis above carries over very naturally. 

The only difficulties are in lookback and barrier options where the classical re- 
sults for the distribution of the minimum or maximum of a one- dimensional Brow- 
nian motion, do not extend to the joint distribution of the minima or maxima of two 
correlated Brownian motions. An alternative approach may be to sub-sample from 
the Brownian Bridge interpolant for those timesteps which are most likely to give 
the global minimum or maximum. This may need to be combined with splitting for 
the barrier option to avoid the 0(1) difference in payoffs. An alternative might be 
to use adaptive time-stepping [40]. 

For multi-dimensional SDEs which do not satisfy the commutativity condition 
the Milstein discretisation requires the simulation of Levy areas. This is unavoidable 
to achieve first order strong convergence; the classical result of Clark & Cameron 
says that 0(h 1 ' 2 ) strong convergence is the best that can be achieved in general 
using just Brownian increments lfl4l . 

However, Giles & Lukasz have developed an antithetic treatment which achieves 
a very low variance despite the 0(h 1 ' 2 ) strong convergence [26|. The estimator 
which is used is 

Yt=Nt 1 L 5te(«*)+^«))-^-i(fl%)- 

i 

Here C0i represents the driving Brownian path, and (of is an antithetic counterpart 
defined by a time-reversal of the Brownian path within each coarse timestep. This 
results in the Brownian increments for the antithetic fine path being swapped rel- 
ative to the original path. Lengthy analysis proves that the average of the fine and 



10 Michael B.Giles 

antithetic paths is within 0(h) of the coarse path, and hence the multilevel variance 
is 0(h 2 ) for smooth payoffs, and 0(h 3 ' 2 ) for the standard European call option. 

This treatment has been extended to handle lookback and barrier options J27). 
This combines sub-sampling of the Brownian path to approximate the Levy areas 
with sufficient accuracy to achieve (9(/i 3 ' 4 ) strong convergence, with an antithetic 
treatment at the finest level of resolution to ensure that the average of the fine paths 
is within 0(h) of the coarse path. 



3.3 Levy processes 

3.3.1 Jump-diffusion processes 

With finite activity jump-diffusion processes, such as in the Merton model P4l . it is 
natural to simulate each individual jump using a jump-adapted discretisation l47l . 

If the jump rate is constant, then the jumps on the coarse and fine paths will occur 
at the same time, and the extension of the multilevel method is straightforward ll52) . 

If the jump rate is path-dependent then the situation is trickier. If there is a known 
upper bound to the jump rate, then one can use Glasserman & Merener's "thinning" 
approach OTl in which a set of candidate jump times is simulated based on the 
constant upper bound, and then a subset of these are selected to be real jumps. The 
problem with the multilevel extension of this is that some candidate jumps will be 
selected for the coarse path but not for the fine path, or vice versa, leading to an 
0(1) difference in the paths and hence the payoffs. Xia overcomes this by using 
a change of measure to select the jump times consistently for both paths, with a 
Radon-Nikodym derivative being introduced in the process 



3.3.2 More general processes 

With infinite activity Levy processes it is impossible to simulate each jump. One 
approach is to simulate the large jumps and either neglect the small jumps or ap- 
proximate their effect by adding a Brownian diffusion term ifTTl [181 l43l . Following 
this approach, the cutoff Sg for the jumps which are simulated varies with level, 
and 8( — s- as £ — >• °° to ensure that the bias converges to zero. In the multilevel 
treatment, when simulating P( — Pf_ i the jumps fall into three categories. The ones 
which are larger than 5g_j get simulated in both the fine and coarse paths. The ones 
which are smaller than S( are either neglected for both paths, or approximated by 
the same Brownian increment. The difficulty is in the intermediate range [Si, 8g-i] 
in which the jumps are simulated for the fine path, but neglected or approximated for 
the coarse path. This is what leads to the difference in path simulations, and hence 
to a non-zero value for P$ — P t >_ \ . 

Alternatively, for many SDEs driven by a Levy process it is possible to directly 
simulate the increments of the Levy process over a set of uniform timesteps | 
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in exactly the same way as one simulates Brownian increments. For other Levy 
processes, it may be possible in the future to simulate the increments by constructing 
approximations to the inverse of the cumulative distribution function. Where this is 
possible, it may be the best approach to achieve a close coupling between the coarse 
and fine path simulations, and hence a low variance Vg, since the increments of the 
driving Levy process for the coarse path can be obtained trivially by summing the 
increments for the fine path. 



4 SPDEs 

After developing the MLMC method for SDE simulations, it was immediately clear 
that it was equally applicable to SPDEs, and indeed the computational savings 
would be greater because the cost of a single sample increases more rapidly with 
grid resolution for SPDEs with higher space-time dimension. 

In 2006, the author discussed this with Thomas Hou in the specific context of 
elliptic SPDEs with random coefficients, and Hou's postdoc then performed the 
first unpublished MLMC computations for SPDEs. The first published work was by 
a student of Klaus Ritter in her Diploma thesis l32l ; her application was to parabolic 
SPDEs. Since this early work, there has been a variety of papers on elliptic l6l[T3l 
□3] ED, parabolic BED and hyperbolic |45) SPDEs. 

In almost all of this work, the construction of the multilevel estimator is quite 
natural, using a geometric sequence of grids and the usual estimators for Pg— Pt-\. 
It is the numerical analysis of the variance of the multilevel estimator which is often 
very challenging. 



4.1 Elliptic SPDE 

The largest amount of research on multilevel for SPDEs has been for elliptic PDEs 
with random coefficients. The PDE typically has the form 

-V • (k(x, co)Vp(x, co)) = 0, xeD. 

with Dirichlet or Neumann boundary conditions on the boundary dD. For sub- 
surface flow problems, such as the modelling of groundwater flow in nuclear waste 
repositories, the diffusivity (or permeability) lc is often modelled as a lognormal ran- 
dom field, i.e. log A: is a Gaussian field with a uniform mean (which we will take to be 
zero for simplicity) and a covariance function of the general form R(x,y) = r(x— y). 
Samples of log A: are provided by a Karhunen-Loeve expansion: 

logk(x,eo) = £ ^/F n £, n (co) f n (x), 

n=0 
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where 9 n are the eigenvalues of R(x,y) in decreasing order, /„ are the corresponding 
eigenfunctions, and t, n are independent unit Normal random variables. However, it 
is more efficient to generate them using a circulant embedding technique which 
enables the use of FFTs [ 19 1. 

The multilevel treatment is straightforward. The spatial grid resolution is doubled 
on each level. Using the Karhunen-Loeve generation, the expansion is truncated 
after K( terms, with K( increasing with level ]3T); in unpublished work, a similar 
approach has also been used with the circulant embedding generation. 

In both cases, log A: is generated using a row-vector of independent unit Normal 
random variables £, . The variables for the fine level can be partitioned into those for 
the coarse level E,i-\, plus some additional variables zt, giving ^ = (<^_i,z^). It is 
possible to develop an antithetic treatment similar to that used for SDEs by defining 
E," = (4f_i,— Zf). This gives a second logA^' field on the fine grid, and then the 
multilevel estimator can be based on the average of the two outputs obtained on the 
fine grid, minus the output obtained on the coarse grid using log kg~\ . Unfortunately, 
numerical experiments indicate it gives little benefit; it is mentioned here as another 
illustration of an antithetic estimator, and as a warning that it does not always yields 
significant benefits. 

The numerical analysis of the multilevel approach for these elliptic SPDE appli- 
cations is challenging because the diffusivity is unbounded, but Charrier, Scheichl 
& Teckentrup Ifl3l have successfully analysed it for certain output functionals, and 
Teckentrup et al have further developed the analysis for other output functionals and 
more general log-normal diffusivity fields iBTI . 



4.2 Parabolic SPDE 



Giles & Reisinger 11251 consider an unusual SPDE from credit default modelling, 

dp 1 dp dp 

dp = -ju 3 i -df + - 3 -ydr- x /p 3^-dMf, x> 



dx 2 dx 2 d 



x 



subject to boundary condition p(Q,t) = 0. Here p(x,t) represents the probability 
density function for firms being a distance x from default at time t. The diffusive 
term is due to idiosyncratic factors affecting individual firms, while the stochastic 
term due to the scalar Brownian motion M t corresponds to the systemic movement 
due to random market effects affecting all firms. The payoff corresponds to different 
tranches of a credit derivative which depends on the integral JJ° p(x, t ) dx at a set of 
discrete times. 

A Milstein time discretisation with timestep k, and a central space discretisation 
of the spatial derivatives with uniform spacing h gives the numerical approximation 
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where p n - « p(jh,nk), and the Z„ are standard Normal random variables so that 

\fh Z„ corresponds to an increment of the driving scalar Brownian motion. 

The multilevel implementation is very straightforward, with k( = ki-i/2 and 
h( = h(i_\/4 due to numerical stability considerations which are analysed in the 
paper. As with SDEs, the coupling between the coarse and fine samples comes from 
summing the fine path Brownian increments in pairs to give the increments for the 
coarse path. The computational cost increases by factor 8 on each level, and nu- 
merical experiments indicate that the variance decreases by factor 8, so the overall 
computational complexity to achieve an 0(e) RMS error is again (9(£~ 2 (loge) 2 ). 



5 Continuous-time Markov Chain simulation 

Anderson & Higham have recently developed a very interesting new application of 
multilevel to continuous-time Markov Chain simulation |2)- Although they present 
their work in the context of stochastic chemical reactions, when species concentra- 
tions are extremely low and so stochastic effects become significant, they point out 
that the method has wide applicability in other areas. 

In the simplest case of a single chemical reaction, the "tau-leaping" method 
(which is essentially the Euler-Maruyama method, approximating the reaction rate 
as being constant throughout the timestep) gives the discrete equation 

x„+i = x n +P(hX(x n )), 

where h is the timestep, A (x„) is the reaction rate (or propensity function), and P(t) 
represents a unit-rate Poisson random variable over time interval t . 

If this equation defines the fine path in the multilevel simulation, then the coarse 
path, with double the timestep, is given by 



v «+2 



= x c n +P(2hX(x c n )) 



for even timesteps n. The question then is how to couple the coarse and fine path 
simulations. 

The key observation by Anderson & Higham J2] is that for any t\,t2 > 0, the 
sum of two independent Poisson variates P(t i ), P(ti) is equivalent in distribution to 
P{ti+t2j. Based on this, the first step is to express the coarse path Poisson variate 
as the sum of two Poisson variates, P(hX(x^)) corresponding to the first and second 
fine path timesteps. For the first of the two fine timesteps, the coarse and fine path 
Poisson variates are coupled by defining two Poisson variates based on the minimum 
of the two reactions rates, and the absolute difference, 

P l =p(hmm(X(x„),X(x' ll ))), P 2 = p(h\X(x n )-X(x c n )\ 
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and then using Pi as the Poisson variate for the path with the smaller rate, and 
Pi +P2 for the path with the larger rate. This elegant approach naturally gives a 
small difference in the Poisson variates when the difference in rates is small, and 
leads to a very effective multilevel algorithm. 

In their paper J3], Anderson & Higham treat more general systems with multiple 
reactions, and include an additional coupling at the finest level to an SSA (Stochas- 
tic Simulation Algorithm) computation, so that their overall multilevel estimator 
is unbiased, unlike the estimators discussed earlier for SDEs. Finally, they give a 
complete numerical analysis of the variance of their multilevel algorithm. 

Because stochastic chemical simulations typically involve 1000's of reactions, 
the multilevel method is particularly effective in this context, providing computa- 
tional savings in excess of a factor of 100 J2). 



6 Wasserstein metric 

In the multilevel treatment of SDEs, the Brownian or Levy increments for the coarse 
path are obtained by summing the increments for the fine path. Similarly, in the 
Markov Chain treatment, the Poisson variate for the coarse timestep is defined as 
the sum of two Poisson variates for fine timesteps. 

This sub-division of coarse path random variable into the sum of two fine path 
random variables should work in many settings. The harder step in more general ap- 
plications is likely to be the second step in the Markov Chain treatment, tightly cou- 
pling the increments used for the fine and coarse paths over the same fine timestep. 

The general statement of this problem is the following: given two very similar 
scalar probability distributions, we want to obtain samples Zf,Z c from each in a 
way which minimises E[|Zy— Z c \ p ]. This corresponds precisely to the Wasserstein 
metric which defines the "distance" between two probability distributions as 

i/p 

inf / \\Z f -Z c \\ p dy{Z f ,Z c ) 

where the minimum is over all joint distributions with the correct marginals. In ID, 
the Wasserstein metric is equal to 

1 P \ 1/p 



<2V>)-<^>) 



du 



where <Pf and <P C are the cumulative probability distributions for Zf and Z c 0, and 
this minimum is achieved by choosing Zf = <t>7 (U), Z c = <£>~ X (U), for the same 
uniform [0, 1] random variable U . This suggests this may be a good general tech- 
nique for future multilevel applications, provided one is able to invert the relevant 
cumulative distributions, possibly through generating appropriate spline approxima- 
tions. 
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7 Other uses of multilevel 
7.1 Nested simulation 

The pricing of American options is one of the big challenges for Monte Carlo meth- 
ods in computational finance, and Belomestny & Schoenmakers have recently writ- 
ten a very interesting paper on the use of multilevel Monte Carlo for this purpose 
0. Their method is based on Anderson & Broadie's dual simulation method [ 1 1 in 
which a key component at each timestep in the simulation is to estimate a condi- 
tional expectation using a number of sub-paths. 

In their multilevel treatment, Belomestny & Schoenmakers use the same uni- 
form timestep on all levels of the simulation. The quantity which changes between 
different levels of simulation is the number of sub-samples used to estimate the 
conditional expectation. To couple the coarse and fine levels, the fine level uses N( 
sub-samples, and the coarse level uses Afy_ i = N(/2 of them. 

Related unpublished research by N. Chen for a similar multilevel treatment of 
nested simulation found that the multilevel correction variance is reduced if the 
payoff on the coarse level is replaced by an average of the payoffs obtained using 
the first N(/2 and the second Afy/2 samples. This is similar in some ways to the 
antithetic approach described earlier. 

In future research, Belomestny & Schoenmakers intend to also change the num- 
ber of timesteps on each level, to increase the overall computational benefits of the 
multilevel approach. 



7.2 Truncated series expansions 

Building on earlier work by Broadie & Kaya ifTTI . Glasserman & Kim have re- 
cently developed an efficient method 1301 of exactly simulating the Heston stochas- 
tic volatility model [381. The key to their algorithm is a method of representing the 
integrated volatility over a time interval [0,T], conditional on the initial and final 
values, i'o and vr as 

T 

V s ds 

o 



Vo = vo,Vt=v t ) = Y. x " + Y. >'" + H z " 

n=\ n=\ n=\ 



where x n ,y„,z„ are independent random variables. 

In practice, they truncate the series expansions at a level which ensures the de- 
sired accuracy, but a more severe truncation would lead to a tradeoff between accu- 
racy and computational cost. This makes the algorithm a candidate for a multilevel 
treatment in which the level i computation performs the truncation at N(, so the level 
£ computation would use 
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E z « 


n=\ 




n=\ 




n=\ 



while the level I— 1 computation would truncate the summations at A^_i, but would 
use the same random variables x„,y„,z„ for 1 < n < Ne-i. 

This kind of multilevel treatment has not been tested experimentally, but it seems 
that it might yield some computational savings even though Glasserman & Kim 
typically only need to retain 10 terms in their summations through the use of a 
carefully constructed estimator for the truncated remainder. The savings may be 
larger in other circumstances which require more terms to be retained for the desired 
accuracy. 



7.3 Mixed precision arithmetic 

The final example of the use of multilevel is unusual, because it concerns the com- 
puter implementation of Monte Carlo algorithms. In the latest CPUs from Intel and 
AMD, each core has a vector unit which can perform 8 single precision or 4 double 
precision operations with one instruction. Also, double precision data takes twice as 
much time to transfer as single precision data. Hence, single precision computations 
can be twice as fast as double precision on CPUs, and the difference can be even 
greater on GPUs. This raises the question of whether single precision arithmetic is 
sufficient for Monte Carlo simulation. 

My view is that it usually is since the finite precision rounding errors are smaller 
than the other sources of error: statistical error due to Monte Carlo sampling; bias 
due to SDE discretisation; model uncertainty. However, there can be significant er- 
rors when averaging unless one uses binary tree summation ||39l to perform the 
summation, and in addition computing sensitivities by perturbing input parameters 
(so-called "bumping") can greatly amplify the rounding errors. 

The best solution is perhaps to use double precision for the final averaging, and 
pathwise sensitivity analysis or the likelihood ratio method for computing sensitiv- 
ities, but if there remains a need for the path simulation to be performed in double 
precision then one could use the two-level MLMC approach in which level cor- 
responds to single precision and level 1 corresponds to double precision, with the 
same random numbers being used for both. 



7.4 Multiple outputs 

In all of the discussion so far, we have been concerned with a single expectation aris- 
ing from a stochastic simulation. However, there are often times when one wishes 
to estimate the expected value of multiple outputs. 
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Extending the analysis in section [L2l when using multilevel to estimate M differ- 
ent expectations, using N/ samples on each level, the goal is to achieve an acceptably 
small variance for each output 

L 

Y, N i lv t,m < £ m, m=l,...,M, 

(=0 

with the desired accuracy e„, being allowed to vary from one output to another, and 
to do so with the minimum computational cost which is given as usual as 

L 
1=0 

assuming that the cost of computing the output functions is negligible compared to 
the cost of obtaining the stochastic sample (e.g. through an SDE path simulation). 

This leads naturally to a constrained optimisation problem with a separate La- 
grange multiplier for each output. However, a much simpler idea, due to Tigran 
Nagapetyan, which in practice is almost always equivalent, is to define 

,, Vt,m 

Vi = max — '^r- 

L 

and make the variance constraint V Nf V( < 1 . 

i=0 
This is sufficient to ensure that all of the individual constraints are satisfied, and 

we can then use the standard approach with a single Lagrange multiplier. This multi- 
output approach is currently being investigated by Nagapetyan, Ritter and the author 
for the approximation of cumulative distribution functions and probability density 
functions arising from stochastic simulations. 



8 Conclusions 

In the past 6 years, considerable progress has been achieved with the multilevel 
Monte Carlo method for a wide range of applications. This review has attempted to 
emphasise the conceptual simplicity of the multilevel approach; in essence it is sim- 
ply a recursive control variate strategy, using cheap approximations to some random 
output quantity as a control variate for more accurate but more costly approxima- 
tions. 

In practice, the challenge is to develop a tight coupling between successive ap- 
proximation levels, to minimise the variance of the difference in the output obtained 
from each level. In the context of SDE and SPDE simulations, strong convergence 
properties are often relied on to obtain a small variance between coarse and fine sim- 
ulations. In the specific context of a digital option associated with a Brownian SDE, 
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three treatments were described to effectively smooth the output: a analytic condi- 
tional expectation, a "splitting" approximation, and a change of measure. Similar 
treatments have been found to be helpful in other contexts. 

Overall, multilevel methods are being used for an increasingly wide range of ap- 
plications. The biggest savings are in situations in which the coarsest approximation 
is very much cheaper than the finest. So far, this includes multi-dimensional SPDEs, 
and chemical stochastic simulations with 1000's of timesteps. In SDE simulations 
which perhaps only require 32 timesteps for the desired level of accuracy, the po- 
tential savings are naturally quite limited. 

Although this is primarily a survey article, a few new ideas have been introduced: 

• equation (Q]i giving the total computational cost required for a general unbiased 
multilevel estimator is new, as is the discussion which follows it, although the 
underlying analysis is not; 

• based on the ID Wasserstein metric, it seems that inverting the relevant cumula- 
tive distributions may be a good way to couple fine and coarse level simulations 
in multilevel implementations; 

• the multilevel approach could be used in applications which involve the trunca- 
tion of series expansions; 

• a two-level method combining single and double precision computations might 
provide useful savings, due to the lower cost of single precision arithmetic; 

• a multilevel approach for situations with multiple expectations to be estimated. 

Looking to the future, exciting areas for further research include: 

• more use of multilevel for nested simulations; 

• further investigation of multilevel quasi-Monte Carlo methods; 

• continued research on numerical analysis, especially for SPDEs; 

• development of multilevel estimators for new applications. 

For further information on multilevel Monte Carlo methods, see the webpage 

|http :77 peopl e .maths . ox . ac . uk/gilesm/mlmc-community . html| 
which lists the research groups working in the area, and their main publications. 
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